On the Use of Supervised Learning Method for Authorship Attribution
نویسنده
چکیده
In this paper we investigate the use of a supervised learning method for the authorship attribution that is for the identification of the author of a text. We suggest a new, simple and efficient method, which is merely based on counting the number of repetitions of each alphabetic letter in the text, instead of using the traditional classification properties; such as the contents of the text and style of the author; which falls into four feature categories: lexical, syntactic, structural, and content-specific. Furthermore, we apply a spherical classification method. We apply the proposed technique to the work of two Italian writers, Dante Alighieri and Brunetto Latini. With almost high reliability, the spherical classifier proved its ability to discriminate between the selected authors. Finally the results are compared with those obtained by means of a standard Support Vector Machine classifier.
منابع مشابه
Kernel Methods and String Kernels for Authorship Analysis
This paper presents our approach to the PAN 2012 Traditional Authorship Attribution tasks and the Sexual Predator Identification task. We approached these tasks with machine learning methods that work at the character level. More precisely, we treated texts as just sequences of symbols (strings) and used string kernels in conjunction with different kernel-based learning methods: supervised and ...
متن کاملTowards a better understanding of Burrows's Delta in literary authorship attribution
Burrows’s Delta is the most established measure for stylometric difference in literary authorship attribution. Several improvements on the original Delta have been proposed. However, a recent empirical study showed that none of the proposed variants constitute a major improvement in terms of authorship attribution performance. With this paper, we try to improve our understanding of how and why ...
متن کاملStyle based Authorship Attribution on English Editorial Documents
The aim of the authorship attribution is identification of the author/s of unknown document(s). Every author has a unique style of writing pattern. The present paper identifies the unique style of an author(s) using lexical stylometric features. The lexical feature vectors of various authors are used in the supervised machine learning algorithms for predicting the unknown document. The highest ...
متن کاملA Web-Based Self-training Approach for Authorship Attribution
As any other text categorization task, authorship attribution requires a large number of training examples. These examples, which are easily obtained for most of the tasks, are particularly difficult to obtain for this case. Based on this fact, in this paper we investigate the possibility of using Webbased text mining methods for the identification of the author of a given poem. In particular, ...
متن کاملDetecting authorship deception: a supervised machine learning approach using author writeprints
We describe a new supervised machine learning approach for detecting authorship deception, a specific type of authorship attribution task particularly relevant for cybercrime forensic investigations, and demonstrate its validity on two case studies drawn from realistic online data sets. The core of our approach involves identifying uncharacteristic behavior for an author, based on a writeprint ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012